A Corpus Balancing Method for Language Model Construction

نویسندگان

  • Luis Villaseñor Pineda
  • Manuel Montes-y-Gómez
  • Manuel Alberto Pérez-Coutiño
  • Dominique Vaufreydaz
چکیده

The language model is an important component of any speech recogn ition system. In this paper, we present a lexical enrichment methodology of corpora focused on the construction of statistical language models. This methodology considers, on one hand, the identification of the set of poor represented words of a given training corpus, and on the other hand, the enrichment of the given corpus by the repetitive inclusion of selected text fragments containing these words. The first part of the paper describes the formal details about this methodology; the second part presents some experiments and results that validate our method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cultural Influence on the Expression of Cathartic Conceptualization in English and Spanish: A Corpus-Based Analysis

This paper investigates the conceptualization of emotional release from a cognitive linguistics perspective (Cognitive Metaphor Theory). The metaphor weeping is a means of liberating contained emotions is grounded in universal embodied cognition and is reflected in linguistic expressions in English and Spanish. Lexicalization patterns which encapsulate this conceptualization i...

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

An Exploration of Discoursal Construction of Identity in Academic Writing

The view that academic writing is purely objective, impersonal and informational, which is often reflected in English for Academic Purposes materials, has been criticized by a number of researchers. By now, the view of academic writing as embodying interaction among writers, readers and the academic community as a whole has been established. Following this assumption, the present study focused ...

متن کامل

Construction of spoken language model including fillers using filler prediction model

This paper proposes a novel method to construct a spoken language model including fillers from a corpus including no fillers using a filler prediction model. It consists of two submodels: a filler insertion model which predicts places where fillers should be inserted, and a filler selection model which predicts appropriate fillers for given places. It converts a corpus that covers domain-releva...

متن کامل

Dynamic Modeling and Construction of a New Two-Wheeled Mobile Manipulator: Self-balancing and Climbing

Designing the self-balancing two-wheeled mobile robots and reducing undesired vibrations are of great importance. For this purpose, the majority of researches are focused on application of relatively complex control approaches without improving the robot structure. Therefore, in this paper we introduce a new two-wheeled mobile robot which, despite its relative simple structure, fulfills the req...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003